Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autodoc: Change dictionary sort order #22441

Merged
merged 1 commit into from
Aug 6, 2024

Conversation

khwilliamson
Copy link
Contributor

@khwilliamson khwilliamson commented Jul 29, 2024

This makes this more in line with Data::Dumper sorting.

upper/lower case continues to not matter, and numbers continue to come after letters, so that ckWARN2() comes after plain ckWARN().

It changes non-leading underscores to come before letters, so that ck_warner comes before ckWARN.

And it changes so leading underscores come after non-leading, so that aMY_CXT and aMY_CXT_ come before
_aMY_CXT

@jkeenan
Copy link
Contributor

jkeenan commented Jul 29, 2024

This makes this more in line with Data::Dumper sorting.

To review this pull request, I built blead through make test_prep and did likewise with a branch built from your p.r. (rebased on blead). I then compared the respective pod/perlapi.pod generated files.

It changes numbers to come after letters, so that ckWARN2() comes after ckWARN().

Grepping the two generated files (as described below), I couldn't find any =item entries that illustrate this objective. How would I search the generated files to find examples of this objective?

It changes non-leading underscores to come before letters, so that ABC_DEF comes before ABCDEF, as the former is likely to be seen as two words, and ABC should come before ABCD.

If I grepped for =item entries in the two files ...

ack '^=item C<[^>]*>' pod/perlapi.pod

... then took a diff of the two greps, then I observed:

663,667d662
< =item C<LONGDBLINFBYTES>
< =item C<LONGDBLMANTBITS>
< =item C<LONGDBLNANBYTES>
< =item C<LONG_DOUBLEKIND>
< =item C<LONG_DOUBLESIZE>
672a668,672
> =item C<LONG_DOUBLEKIND>
> =item C<LONG_DOUBLESIZE>
> =item C<LONGDBLINFBYTES>
> =item C<LONGDBLMANTBITS>
> =item C<LONGDBLNANBYTES>

... which appears to meet your objective.

And it changes so leading underscores come after non-leading, so aTHX comes before _aTHX

Using the search procedure described above, I found in blead:

 4078 =head1 Concurrency
 4079 
 4080 =over 4
 4081 
 4082 =item C<aTHX>
 4083 
 4084 =item C<aTHX_>
 4085 
 4086 Described in L<perlguts>.
 4087 
 4088 =back
 4089 

... while in the branch I found:

 4052 =head1 Concurrency
 4053 
 4054 =over 4
 4055 
 4056 =item C<aTHX>
 4057 
 4058 =item C<aTHX_>
 4059 
 4060 Described in L<perlguts>.
 4061 
 4062 =back
 4063 

So there was no change in aTHX versus aTHX_. Can you provide an example of where this change took effect?

@khwilliamson
Copy link
Contributor Author

Sorry for my glib description. I forgot that this retained the existing sort order of numbers where the come after letters. I changed the commit message to give real examples of the things that did change

autodoc.pl Outdated
Comment on lines 2165 to 2191
# Convert all digit sequences to same length with leading zeros, so for
# example, 8 will compare less than 16 (using a fill length value that
# should be longer than any sequence in the input).
# Convert all digit sequences to be the same length with leading zeros, so
# that, for example '8' will sort before '16' (using a fill length value
# that should be longer than any sequence in the input).
$a =~ s/(\d+)/sprintf "%06d", $1/ge;
$b =~ s/(\d+)/sprintf "%06d", $1/ge;

# Translate any underscores and digits so they compare after all Unicode
# characters
$a =~ tr[_0-9]/\x{110000}-\x{11000A}/;
$b =~ tr[_0-9]/\x{110000}-\x{11000A}/;
# Translate any underscores so they sort lowest. This causes 'word1_word2'
# to sort before 'word1word2' for all words.
# And translate any digits so they come after anything else. This causes
# digits to sort highest)
$a =~ tr[_0-9]/\0\x{110000}-\x{110009}/;
$b =~ tr[_0-9]/\0\x{110000}-\x{110009}/;

# Then move leading underscores to the end, translating them to above
# everything else. This causes '_word_' to compare just after 'word_'
$a .= "\x{11000A}" x length $1 if $a =~ s/ ^ (\0+) //x;
$b .= "\x{11000A}" x length $1 if $b =~ s/ ^ (\0+) //x;

use feature 'state';
# Modify \w, \W to reflect the changes.
state $ud = '\x{110000}-\x{11000A}'; # xlated underscore, digits
state $w = "\\w$ud"; # new \w string
use feature 'state';
state $w = "\\w\0\x{110000}-\x{11000A}"; # new \w string
state $mod_w = qr/[$w]/;
state $mod_W = qr/[^$w]/;

# Only \w for initial comparison
my $a_only_word = uc($a =~ s/$mod_W//gr);
my $b_only_word = uc($b =~ s/$mod_W//gr);

# And not initial nor interior underscores nor digits (by squeezing them
# out)
my $a_stripped = $a_only_word =~ s/ (*atomic:[$ud]+) (*pla: $mod_w ) //grxx;
my $b_stripped = $b_only_word =~ s/ (*atomic:[$ud]+) (*pla: $mod_w ) //grxx;
# Strip out \W.
my $a_stripped = $a =~ s/$mod_W//gr;
my $b_stripped = $b =~ s/$mod_W//gr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this duplicated code between $a and $b could go into a function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

This makes this more in line with Data::Dumper sorting.

upper/lower case continues to not matter, and numbers continue to come
after letters, so that ckWARN2() comes after plain ckWARN().

It changes non-leading underscores to come before letters, so that
ck_warner comes before ckWARN.

And it changes so leading underscores come after non-leading, so that
aMY_CXT and aMY_CXT_ come before _aMY_CXT.
@khwilliamson khwilliamson merged commit 716d8ca into Perl:blead Aug 6, 2024
33 checks passed
@khwilliamson khwilliamson deleted the api_sort_order branch August 6, 2024 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants